Maximum likelihood sub-band adaptation for robust speech recognition

نویسندگان

  • Donglai Zhu
  • Satoshi Nakamura
  • Kuldip K. Paliwal
  • Ren-Hua Wang
چکیده

Noise-robust speech recognition has become an important area of research in recent years. In current speech recognition systems, the Mel-frequency cepstrum coefficients (MFCCs) are used as recognition features. When the speech signal is corrupted by narrow-band noise, the entire MFCC feature vector gets corrupted and it is not possible to exploit the frequency-selective property of the noise signal to make the recognition system robust. Recently, a number of subband speech recognition approaches have been proposed in the literature, where the full-band power spectrum is divided into several sub-bands and then the sub-bands are combined depending on their reliability. In conventional sub-band approaches the reliability can only be set experimentally or estimated during training procedures, which may not match the observed data and often causes degradation of performance. We propose a novel sub-band approach, where frequency sub-bands are multiplied with weighting factors and then combined and converted to cepstra, which have proven to be more robust than both full-band and conventional sub-band cepstra in our experiments. Furthermore, the weighting factors can be estimated by using maximum likelihood adaptation approaches in order to minimize the mismatch between trained models and observed features. We evaluated our methods on AURORA2 and Resource Management tasks and obtained consistent performance improvement on both tasks. 2005 Published by Elsevier B.V.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum likelihood sub-band weighting for robust speech recognition

Sub-band speech recognition approaches have been proposed for robust speech recognition, where full-band power spectra are divided into several sub-bands and then likelihoods or cepstral vectors of the sub-bands are merged depending on their reliability. In conventional sub-band approaches, correlations across the sub-bands are not modeled and the merging weights can only be set experientially ...

متن کامل

IMPROVED HMM ENTROPY FOR ROBUST SUB−BAND SPEECH RECOGNITION (ThuPmOR1)

In recent years, sub−band speech recognition has been found useful in robust speech recognition, especially for speech signals contaminated by band−limited noise. In sub−band speech recognition, full band speech is divided into several frequency sub−bands and then sub−band feature vectors or their generated likelihoods by corresponding sub−band recognizers are combined to give the result of rec...

متن کامل

Sub-band weighted projection measure for robust sub-band speech recognition

In recent years, sub-band speech recognition has been found useful in robust speech recognition, especially for speech signals contaminated by band-limited noise. In sub-band speech recognition, full band speech is divided into several frequency sub-bands and then sub-band feature vectors or their generated likelihoods by corresponding sub-band recognizers are combined to give the result of rec...

متن کامل

Linear transformations in sub-band groups for speech recognition

Linear transforms have been demonstrated to successfully achieve on-line speaker and environmental adaptation for robust recognition. This paper explores the gains in computational speed, speaker adaptation convergence rate and recognition performance obtained through the use of multi-resolution sub-band linear transforms in speech recognition. A useful feature of multiresolution processing is ...

متن کامل

Robust feature space adaptation for telephony speech recognition

Speaker adaptation is critical for modern speech recognition systems. Due to the computational and multi-channel model sharing considerations, the use of model adaptation techniques is limited in telephony speech recognition systems. On the other hand, feature space adaptation methods such as feature space maximum likelihood linear regression (fMLLR) are efficient approaches suitable for teleph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2005